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^SJ ' Abstract 

,.^ ' Quantum circuits are time dependent diagrams describing the process of quantum computation. Usually, a quan- 

UJ , tum algorithm must be mapped into a quantmn circuit. Optimal synthesis of quantum circuits is intractable and 

r^ • heuristic methods must be employed. With the use of heuristics, the optimality of circuits is no longer guaranteed. 

r*-.. ' In this paper, we consider a local optimization technique based on templates to simplify and reduce the depth of 

^SJ , non-optimal quantum circuits. We present and analyze templates in the general case, and provide particular details 
for the circuits composed of NOT, CNOT and controlled-.sgrt-of-NOT gates. We apply templates to optimize various 

Cn I common circuits implementing multiple control Toffoli gates and quantum Boolean arithmetic circuits. We also show 

^ . how templates can be used to compact the number of levels of a quantum circuit. The runtime of our implementation 

' - ' ' is small while the reduction in number of quantum gates and number of levels is significant. 

O; 

^ ■ 1 Introduction 

f^ ' Research in quantum circuit synthesis is motivated by the growing interest in quantum computation |[19| and advances 

^^ , in experimental implementations ||4] [T] E] |25] . In reaUstic devices, experimental errors and decoherence introduce 

O ■ eiTors during computation. Therefore, to obtain a robust implementation, it is imperative to reduce the number of 

I , gates and the overall running time of an algorithm. The latter can be done by parallelizing (compacting levels) the 

P5 ■ circuit as much as possible. 

C^ , Even for circuits involving only few variables, it is at present intractable to find an optimal implementation. Thus 

a number of heuristic synthesis methods have emerged. Application of these methods usually results in a non-optimal 
circuit, that can be simplified with local optimization techniques. Additionally, some quantum circuits for important 
classes of functions, such as adders and modular exponentiation, were created and compacted in an ad-hoc manner 

lEIlTl. 

^;J] ■ Local optimization has only recently been considered as a possible tool for the gate count reduction in quantum 

. . ! IIT3I and reversible (quantum Boolean) circuits fTO]. Some quantum circuit identities that could be used for circuit 

simplification can be found in [19|. While these provide several rewriting rules with no ready to use algorithm for 
their application, there is clearly a benefit in a systematic approach through the use of templates discussed in this 
paper. A somewhat different approach for local optimization of reversible NOT-CNOT- Toffoli circuits was applied 
for the simplification of random reversible circuits in |22|. That approach and our template method are difficult to 
compare as they have been applied to different types of circuits with different metrics for the circuit cost. 

So far, CAD tool designers spent little effort on minimizing the number of logic levels in quantum circuits. 
However, this allows a shorter running time as it results in a parallelization of the algorithm. More importantly, in the 
popular quantum error model where errors appear randomly with time, a parallel circuit helps to reduce the errors. For 
instance, it may be possible to use smaller number of eiTor correction code concatenations (each of which is a very 
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expensive operation, requiring to at least triple the number of physical qubits |fT9ll ) if the circuit is well parallelized. 
To the best of our knowledge, all of the presently existing quantum circuits were at best compacted in an ad-hoc 
fashion. In our work we automate level compaction through the use of templates. 

Methods based on templates have been considered for Toffoli reversible network simpUfication |fT4|. In this paper, 
we revisit the definition of templates and show how they can be applied in the quantum case as a systematic basis for 
the quantum circuit simplification and level compaction. 

The papeilj is organized as follows. We start with a brief overview of the necessary background in Section ID 
In Section |3] we define the templates and discuss some of their properties. We present a method to identify the 
templates and describe two algorithms, one to reduce the cost and the other to reduce the number of logic levels of 
quantum circuits in Section |4] We next choose a specific quantum gate library, and illustrate effectiveness of the 
above approach. Section |5]presents a set of small quantum templates for NOT, CNOT and controlled-sg'r?-of-NOT 
gates and illustrates the algorithms. The benchmark results presented in Section |6] are divided into two parts. We 
first optimize quantum implementations of the multiple control Toffoli gates (including multiple control Toffoli gates 
with negative controls) and then consider optimization of some NOT-CNOT-ToffoU circuits available through existing 
relevant literature. Discussion of future work and concluding remarks are found in Sections|2]and[8] 

2 Background 

We present a short review of the basic concepts of quantum computation necessary for this paper An in-depth 
coverage can be found in 1 1^. 

The state of a single qubit is a linear combination a|0) + p 1 1 ) (also written as a vector (a, (3)) in the basis { |0) , 1 1 ) }, 
where a and p are complex numbers called the amplitudes, and |a|^ + |p|^ = 1. Real numbers |a|^ and |p|^ represent 
the probabilities p and q of reading the logic states |0) and |1) upon measurement. The state of a quantum system 
with n > 1 qubits is given by an element of the tensor product of the single state spaces and can be represented as a 
normalized vector of length 2", called the state vector. Quantum system evolution allows changes of the state vector 
through its multiplication by 2" x 2" unitary matrices called gates. 

The above models how a transformation can be performed, but does not indicate how to identify the unitary 
operations that compose the transformation or how to implement them. Efficiency of the physical implementation 
depends on the system's Hamiltonian, and the details of different systems (and associated gate costs) are not a focus 
of this paper. Typically, certain primitive gates are used as elementary building blocks |fT9l l2]|. Among these are: 

• NOT (x I— > x) and CNOT {{x,y) ^-^ {x,x(By)) gates, where x,y G {0, 1} and © is addition modulo 2; 

• Hadamard gate defined by // = 4= ( j j j ; 

• controUed-V gate that depending on the value on its control qubit changes the value on the target qubit using 
the transformation given by the matrix V = ^ ( _^[ 1 ; 

• controlled-y+ that depending on the value of its control qubit changes the value on the target qubit using the 
transformation V^ = V^ ' ; 

• rotation gates Ru[i), Y G [0,27t], a e {x,y,z}- 

We shall write G^^ to denote the gate implementing the inverse function of the function realized by gate G. In 
context, we will use G to mean a gate or the transformation matrix for that gate. The circuit diagrams are built in 
the popular notations, such as those used in |fT9l . In short, horizontal "wires" represent a single qubit each; the time 
in the circuit diagrams is propagated from left to right; (positive) gate controls are depicted with •; targets appear as 



for NOT and CNOT gates, V for controlled-V gate, and | V+ | for controlled-y+ gate with vertical lines joining 



control(s) of a gate with its target. 

The principle of the optimization method is to associate a cost to each of these elementary gates and lower the 
overall circuit cost by reducing the number of high cost gates. The cost definition must reflect how difficult it is to 
implement the gate is and therefore will depend on the details of the physical device considered to implement the 
circuit. For example, for NMR techniques the cost of the gate must take into account the number of rf -pulses as 
well as the duration of the interaction periods necessary to implement the gate |4|. In a setting guided by the Ising 



'a preliminary version of this work was presented at DATE-2005 conference. This paper discusses circuit parallelization, reports the improved 
results based on a new and significantly more efficient implementation, includes extensive testing results, as well as certain new discussions. 



type Hamiltonian in a weak coupling regime (such as liquid NMR ||4| and superconductors ||6l) a controlled-V and 
its complex conjugate must be associated with approximately half the cost of a CNOT gate each. Thus, controlled-V 
and controlled-V+ are not at all complex gates. Two qubit gate implementation costs in any given Hamiltonian can 
be found using the technique discussed in 1261 . 

The ToffoU gate ll24l and its generalization with more than two controls serve as a good basis for synthesis 
purposes. Indeed, every reversible (quantum Boolean) function can be realized as a cascade of multiple control 
Toffoli gates [2, 14J. The multiple control Toffoli gate flips the target bit if the control bits are in a given Boolean state. 
Unfortunately, multiple control Toffoli gates (including the original Toffoli gate 1241) are not simple transformations 
in quantum technologies. They require a number of elementary quantum operations and Toffoli gates with a large 
number of controls can be quite costly [2 1. However, they can be implemented using circuits composed with 3-qubit 
Toffoli gates ||2] . Finally, the 3-qubit Toffoli gate can be constructed from a set of gates that includes the NOT, CNOT, 
controlled-V and controlled-y+. We therefore consider all these gates in Section |5] when we search for templates to 
simplify the best known quantum circuits implementing large Toffoli gates and reversible functions. In addition, any 
unitary can be synthesized as a generic quantum circuit through exploring the properties of matrix decompositions 
|l2]|2T][T8l . We do not consider those circuits here, but point out that our circuit simplification techniques are applicable 
in any of the above cases. 

3 Templates: definition 

To decrease the cost of a circuit, the basic idea is to replace a sub-circuit with an equivalent one that has lower cost. 
We will call this procedure the application of a rewriting rule. Some problems arise with this technique: 

• In general, even for simple circuits, if rotation gates with any parameter y are allowed the number of possible 
rewriting rules is infinite. 

• Equivalent circuits with same cost might require different sets of rewriting rules to be simplified. 

• A sub-circuit may be rewritten in another form having the same cost, but this second form could allow extra- 
simplifications on the circuit using other rewriting rules. 

One of the problems arising from these considerations is to minimize the number of rewriting rules by keeping 
only "essential" ones. To address these issues we introduce the notion of templates that will be applicable to all 
quantum gate libraries and discuss the algorithms for quantum gate reduction and level compaction. 

Definition: A size m template is a sequence of m gates that implements the identity operator and that satisfies the 
following constraint: any template of size m must be independent of all templates of smaller or equal size, i.e. for a 
given template T of size m no application of any set of templates of smaller or equal size can decrease the number of 
gates in T or make it equal to another template. 

A template can be seen as a generalization of the rewriting rules, since rewriting rules can be derived from 
it. For example, forward application of the template Go Gi... Gm-i — I allows us to find a rewriting rule of the 
form G,G(,+i)^odm- G(,+,,_i)n,od,„ ^ G(/!i)™dm <^(;-2)modm- <^(^!p) mod m' ^here 0<i,p <m-\. Similarly, 
backward appUcation of the template is a rewriting rule of flie form Gr ' <5(/i i ) mod ,« ■ • • ^(/-/i+ 1 ) mod ,« "" ^('■+ 1 ) mod m 
G(i+2) mod »,••■ G(i-k) mod m. where 0<i, p<m-l. 

Template application requires that the inverse of each gate be available. Clearly, templates are a more compact 
way of representing non redundant rewriting rules as it is capable of storing up to 2m^ rewriting rules. 

See Appendix for a proof of the effect of the forward and backward applications of templates. 

4 Templates: application 

In this section we present a method to find and classify the templates and introduce two algorithms using them. One is 
an algorithm for quantum cost reduction and the other for quantum circuit level compaction, both based on the notion 
of the templates. 



4.1 Template identification 

First we find all templates of the form A4^' (length 2), which we call gate-inverse rules. This is straightforward, 
since every self-inverse gate A forms the template AA and every pair of gates A and B, where B ~A^^ forms one 
template of the formAB. 

Subsequent templates are found by identifying increasingly longer sequences of gates that realize the identity 
function and that can not be reduced by other available templates. 

Templates of the form ABAB (length 4) with A = A^ ' and B = B^ applied for parameter p — 2 result in con- 
struction of the rewriting rules AB -^ BA and BA -^ AB. That is, they define the conditions under which two gates 
commute. We call such templates moving rules and apply them to move gates to form matches leading to reduction 
via other templates. 

For applications, we suggest seeking a complete classification of the templates of small size and then supplement- 
ing those by a set of templates that appear to be useful when a specific synthesis procedure is applied. For example, 
if a synthesis procedure (or the circuit types one considers) tends to use a specific type of sub-circuit of cost /j which 
is neither optimal (assume an optimal cost of v) nor can be simplified by a small size complete set of templates, a 
template with total cost /j + v can be created (followed by a generaUzation process when and if needed). In this paper, 
we do not construct any of these supplementary type templates, since we apply templates to the circuits from different 
authors obtained from different synthesis procedures. 

4.2 Cost reduction 

In this section, we present an algorithm to reduce generic quantum circuit cost using the templates. To apply the 
algorithms to a specific physical implementation, we only need to choose a relevant cost definition. 

Input: A quantum circuit specification, i.e. a sequence of gates C1C2...C,,. 

Output: A quantum circuit computing the same function as the input circuit, but having a possibly lesser cost. 

Algorithm: 

1. Let Ck be the stari gate in the circuit for a potential template match. Initially k — 2. 

2. We attempt to match the templates in order of size (excluding the moving rules). The attempt to match to a size 
m template GoGi...G„,_i proceeds as follows: 

(a) Forward matching: Apply the moving rules to arrange the gates preceding Q to be able to match them 
with the given size m template. At this step we determine pair {j,p) such that Ck-i — G^j^i^ modnn < 
/ < p. When such a j and p are found, gates Ci_p_i,Q_p_2i ■jQ can be replaced by the sequence 
G7 , ,f, ^^^ ^j^, < / < m — p. Substitution is done if it is beneficial from the point of view of the overall 
circuit cost reduction. 

(b) Backward matching: To backward match a size m template, the same procedure applies with the following 
matching condition: Q_; = G^^.'_^.j ^^^^, < i < p. Then, gates Ck-p-i,Ck-p-2, ■■■,Ck can be replaced 
by the sequence G(j+p_,) mod mi 0< i <m — p. The decision to replace or not is based on a chosen circuit 
cost metric. 

3. We propagate this procedure through the circuit: 

• If a template substitution was made, then k is set to index of the leftmost gate substituted and we repeat 
step|2] 

• Otherwise, if we can, increment A: by 1 and repeat step|2l If we cannot because Q. is already the rightmost 
gate in the circuit, the algorithm terminates. 

The gate replacement at step|2]is performed when it is beneficial to do such replacement i.e. when the total circuit 
cost is reduced. This imposes extra constraints on the parameter p depending on the exact cost definition. For instance, 
with simple gate count cost metric, p must be greater than m/2. If many pairs (./',/?) are found, the one associated to 
the biggest cost reduction is chosen for the gate substitution. However, even if the total cost after template application 
stays the same (for simple gate count cost metric this means applying an even size template by replacing its half with 



another half, i.e. for even m and p = m/2) the substitution can be beneficial as the new circuit arrangement may 
allow other cost reducing template applications. We take this into account by allowing such "cost retaining" template 
applications as long as A: < Flag (k is value of the subscript of Ck), with the Flag initially set to 0. After each cost 
retaining template application the Flag is set to the current k value, and after each cost reducing template application 
the Flag is set back to 0. This guarantees that the cost reduction algorithm will not run into an infinite loop while 
allowing cost retaining template application. 

In Section|5]we illustrate how the templates are applied to reduce the gate count. 

4.3 Level compaction 

We next suggest a greedy algorithm for quantum circuit level compaction employing templates. A level is defined 
as a sub-sequence of commuting gates that can be applied in parallel. Level compaction helps to increase the paral- 
lelization of the circuit implementation and therefore not only optimizes the runtime of the circuit, but also helps to 
decrease the decoherence effects by shortening the overall execution timq^- For simplicity of the algorithm descrip- 
tion, we assume that all gates have the same duration, therefore, the execution time of a level is equal to a single gate 
duration. We also assume that neighboring gates operating on disjoint qubit subsets can always be applied in parallel, 
which is a common assumption for quantum technologies. 

Input: A quantum circuit specification, i.e. a cascade of gates CiC2...C„. 

Output: A re-organized circuit with possibly fewer levels computing the same function as the input circuit. 

Algorithm: The principle is to assign a specific level to each gate. 

1 . Initially, all gates in the circuit have undefined level, / = 1 and we define Qlevelj as an empty set. 

2. Consider Cj the leftmost gate not yet assigned a level. Assign it level i. 

3. Until each gate Q right of Cj is considered: 

(a) If gate Q does not share common qubits with any of the gates in level Qlevelt and gate Q can be moved 
left (using the moving rules) until it is adjacent to the leftmost gate with level / and then assign gate Q 
level i. 

(b) If it is not possible to move gate Q as just described, apply templates using the algorithm described above 
using Q as the start gate and considering only those gates whose level has not been assigned yet. Only 
templates with an even number m of gates are applied and only substitutions of m/2 for m/2 gates are 
made. Such substitution may allow a gate to subsequently (possibly with movement) to be assigned to 
level /. 

4. If there still remain gates not assigned a level, add 1 to / (the number of levels), consider new empty Qlevelj 
and repeat steps|2]to[3] 

At this stage of development, the level compaction algorithm is greedy. We expect that it can likely be improved. 
However, our tests have shown that its current performance already improves relevant quantum circuits. 

5 Quantum NCV templates 

We now present a set of quantum templates based on the NOT, CNOT and controlled-.s^rf-of-NOT (NCV) gates. It 
contains: 



- For instance, a liquid NMR circuit witli a liigli degree of parallelization of single qubit rotations and ZZ-gates will be singnificantly shorter than 
its unparallelized version. Indeed, single qubit rotations on homonuclear spins are usually implemented by selective soft pulses sent sequentially 
to act on each spin. Nevertheless, if we want to act on all homonuclear spins in parallel, it is possible to use a single broad-band short pulse (4). 
As for heteronuclear spins, modem spectrometers have several channels that can be used simultaneously. Therefore, one can rotate heteronuclear 
spins in parallel by pulsing on them in parallel. 

More importantly, in a typical NMR system, the main time consuming gates are the interaction gates (ZZ gates). Because all the couplings are 
always on in a molecule, ZZ-gates naturally occur in parallel in the circuit. To apply a ZZ gate to a given pair of qubits. one needs to use refocussing 
techniques O involving pulses and delays to cancel all the ZZ-interactions but the desired one. Therefore, in most of the cases, regrouping the 
ZZ-gates will allow to optimize the refocussing scheme and reduce the overall number of required delays. In pailicular, refocussing scheme exists 
for any subset of non-intersecting gates defined as a single logic level in this paper LIU . 
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Figure 1: Quantum templates other than the gate-inverse and moving rules. Each of these circuits implements the 
Identity. 
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Figure 2: Simplification of a 10-gate quantum network for the 3-qubit full adder. 



• The gate-inverse rules: NOT and CNOT are self-inverses and controUed-V and controlled-y+ are the inverses 
of each other 

• The moving rule (replace AB with BA): assuming gate A has control set Ca (d is an empty set in the case of 
an uncontrolled gate) and target Ta and gate B has control set Cb and target Tb, these two gates form a moving 
rule if, and only if, Ta%Cb and Tb%Ca- 

• Larger templates: all other templates that we have identified are shown in Figure[T] where V (alternatively V^) 
is substituted for all occurrences of Vo and y+ (alternatively V) is substituted for all occurrences of Vi, i.e. the 
substitution is consistent and distinct for Vb and V\ . The templates reported here were found by inspection. We 
are currently developing a program to find larger templates systematically and to verify completeness of the 
current set. 

To illustrate how templates are applied, consider the quantum circuit for the 3-input full adder with 10 gates from 
f9l. The circuit is built on four qubits as the 3-input adder must be extended to a 4-variable reversible function. Note 
that the original circuit presented in |9| gives 1111 as output for input pattern 0100 instead of the expected 1011. The 
circuit shown in Figure|2K corrects this. 

In the circuit in Figure |2K, gates 5 and 7 (counting from the left) can be moved together and form a gate-inverse 
pair. We move them together and delete them by applying the gate-inverse rule. This results in the circuit illustrated 
in Figure |2jj. 

Next, we notice that gates 4, 6 and 8 in this circuit can also be brought together (gates 4 and 8 should be moved 
towards gate 6. Figure|2lZ! shows the three gates brought together, and Figure|2j) illustrates the resulting circuit after 
the size 5 template is applied. 

The circuit that we found using templates simplification (Figure |2jD) is equivalent to the optimal (for a given 
input-to-output assignment) reported in [9J. It took our program < 0.001 seconds (elapsed time on a 1.8 GHz Athlon 
XP2400H- machine with 512 MB RAM running Windows) to simplify the circuit in Figure|2]\ into the circuit in Figure 
|2j3. The time reported in [91 to synthesize such a circuit is 7 hours. This example clearly shows that templates are 
useful and effective. 

A likely optimal quantum circuit for the 3-input full adder can be constructed from its well-known reversible 
implementation illustrated in Figure|3]\. We first substitute quantum circuits for the Peres gates [20| each of which is 
a Toffoli-CNOT pair (see Figure[3jJ). We then apply the templates. In this case, gates 4 and 6 can be moved together 
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Figure 3: Simplification of an 8-gate quantum circuit for the 3-qubit full adder. 

and match the gate-inverse rule. So, they are both deleted leading to the circuit in Figure [Sf. Finally, we apply the 
level compactor and the circuit is transformed into the one illustrated in Figure[3j) (different logic levels are separated 
by dashed vertical lines). The number of levels in the compacted circuit is 4, and this is minimum due to having 4 
gates with targets on qubit — Com- 

5.1 Other templates 

It is possible to construct the templates in other gate libraries and then use the discussed cost reduction and level 
compaction algorithms verbatim. Constructing the templates for the finite (those seem to be more physical) gate 
libraries may be reduced to finding the rewriting rules by hand and generalizing them into the templates, or running 
a computer search. A parameterization/classification of the templates in this case may be helpful. However, in the 
Ubraries with an infinite number of gates a classification is necessary. We suggest that each template (template class) 
is written in the circuit form and followed by an algebraic expression conditional upon which the template applies. 
For example, in the library with single qubit rotations and CNOT gates the following template might be constructed 
Ra {<x)Ra{^)Ra (y)' where a + p + y = 0. Application of such template can be thought of as finding two single qubit 
rotations about the same axis (not necessarily the conventional X, Y or Z, but a possible combination of them), that can 
be commuted until they are neighbors, and then they get replaced by a single cumulative rotation. Another example 
of a template for this gate library could be /?x(oc)/?z(— ^/2)/?}'(oc)/?z(^/2), which could be used to replace some 
three gates with one, or, for instance, eliminate all Rx gates from a given circuit. In the gate library with controlled 
gates, the following template is possible CU {b,c)CNOT {a,b)CU ' {b,c)CNOT{a,b)CU{a,c), conditional upon gate 
CU being a self inverse. This template is a generalization of the one used in this paper (third template in Figure [U, 
but it captures an infinite number of the rewriting rules. Other templates are possible and depend on the gate base 
considered. The discussed examples are not intended to be treated as complete review of the possible templates, rather 
an illustration what kind of templates may be constructed. 

6 Numerical results 

Reversible logic and quantum arithmetic circuits are often specified with NOT, CNOT and Toffoli gates ll2l l5l [TSl [TtI 
[191 , rather than with gates from the NCV set. Circuits with multiple control Toffoli gates have been studied extensively 
and synthesis procedures exist. To process these circuits we need to transform every Toffoli gate into a circuit with 
NOT, CNOT, controlled-V and controlled-y+ gates. We use the circuit in Figure |4]\ for this purpose. Due to the 
symmetry properties of the NCV and Toffoli gates (interchangeability of the controlled-V and controlled-y+ gates in 
quantum NCV circuits for reversible functions |16|, symmetry of Toffoli gate controls, and self-inverse property of 
the Toffoli gate), there exist 8 distinct but equivalent NCV circuits for a Toffoli gate. In our procedure we use only 
two of them: the circuit in Figure |4l\ and its inverse, and keep the one resulting in a better circuit simplification. 
Empirical test have shown that the use of other six transformations will not yield any new improvements. 
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Figure 4: Optimal NCV circuits for A; 3-qubit Toffoli gate lfT9l and B; 3-qubit Toffoh gate with a single negative 
control (fe)||T6l. 



Table 1 : Simplification of the multiple control Toffoli gate implementations by Barenco etal. f^l and Asano etal. HI . 
The results are grouped in two tables according to source of the initial circuit. Columns Size and Ancilla show the size 
(n-qubit gate) of the multiple control Toffoli gate, and the number of ancilla qubits associated with the implementation 
of this gate. Columns [citation] GC and [citation] D present the gate count (GC) in the best reported quantum NCV 
circuit taken from the appropriate source indicated in [citation], and the corresponding circuit depth (D). We show 
the gate counts and circuit depth for our optimized implementations in columns Opt-d GC and Opt-d D. Whenever 
columns [citation] D and Opt-d D are not present this means that the depth equals to the number of gates both in the 
circuit before optimization and in the circuit after optimization. 
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6.1 Multiple control Toffoli gate simulations 

Multiple control Toffoli gates and their variants with negated controls are a popular basis for the synthesis of re- 
versible circuits and are often used to construct quantum circuits. For instance, multiple control Toffoli gates are used 
in quantum error correcting circuits right after the syndrome was found to correct errors |fT9l . Even more impor- 
tantly, multiple control Toffoli gates are at the heart of amplitude amplification technique f3l that is often considered 
as a separate class of quantum algorithms, of which there are only a few. Thus, multiple control Toffoli gates are 
indispensable for quantum computations, and it is very important to have efficient quantum circuits for them. Imple- 
mentations of multiple control Toffoli gates were studied in fT/Sl. In the following, we simplify and compact levels 
in the multiple Toffoli gate circuits described in IT,^ using our templates based algorithms. We compare our results 
to those presented initially. Table [T] summarizes the results. 

The results in Table [1] show that the set of multiple control Toffoli gates of size n realizations with gate count of 
20n — 60 (Lemma 7.2 in |2|) are always simplified to the circuits with 12« — 34 gates. Based on the regularity and 
predictability of this simplification, we conjecture this will always be the case. Further, our experiment showed that 
asymptotically we get a 40% reduction in the number of gates and in the number of logic levels required in simulation 
of the multiple control Toffoli gates. Similarly, circuits for multiple control Toffoli gates with 5(n — 2)^ gates and 
depth 10« — 20 seem to always simplify to the circuits with 3.75(n — 2)^ gates and having depth 7.5« — 10. 

Multiple control Toffoli gates can be implemented with a single auxiliary qubit as discussed in Corollary 7.4 121. 
Using our tool we achieved upper bound 24n — 88 (for n > 5) for the number of gates and the number of levels 
required in multiple control Toffoli gate simulations with a single auxiliary qubit using just the decomposition from 
Lemma 7.2 of [2|. We stress that the above formulas are upper bounds since we did not yet apply our techniques to 
simplify such circuits. There must be a clever approach in which both types of « — 3 auxiliary qubit decompositions 
are used in the construction due to Corollary 7.4 (|2l|, and depending on whether the final gate count or depth needs to 
be optimized, the choice for a particular multiple Toffoli gate substitution may vary. 

Multiple control Toffoli gates with negations may also be useful in some applications. A canonical implementation 
of such gates ( |fT9l Figures 4.11 and 4.12) assumes a logic layer of NOT gates preparing the literals in the right polarity 
followed by a multiple control Toffoli gate with all positive controls and a level of NOT gates returning the values 
of literals to the positive polarity. This makes multiple control Toffoli gates with negative controls marginally more 
expensive than the multiple control Toffoli gates with only positive controls. In the following, we show that a multiple 



control Toffoli gate with some but not all negative controls can be implemented with the same cost as a multiple 
control Toffoli gate of the same size with only positive controls. 

Given that the 3-qubit Toffoli gate with a single negated control can be implemented with the same (minimal) 
number of gates as a 3-qubit Toffoli gate with positive controls 1161 (see Figure |4jj), such gate can be used in the 
circuit proposed by Barenco et al. 10 to implement multiple control Toffoli gates with some but not all negations 
with no cost overhead. Such simulation is illustrated in Figure |5K- 

Furthermore, such multiple control Toffoli gate with some but not all negative controls implementations (121, 
Lemma 7.2) rely on a similar strategy to simplify and compact levels as the one used for multiple control Toffoli gate 
with all positive controls. Therefore, each multiple control Toffoli gate with some but not all negative controls can be 
implemented with (n — 3) auxiliary qubits, 12n — 34 CNOT, controlled-V and controlled-y+ gates, and 12n — 34 logic 
levels (« > 3). Using the simulation illustrated in Figure|5jJ one can construct multiple control Toffoli gate with some 
but not all negations and requiring a single auxiliary qubit with no more than 24« — 88 gates and the same number of 
logic levels (for n > 5). 

Implementation of multiple control Toffoli gate with all negations requiring (n — 3) auxiliary qubits will require 
2 extra NOT gates, however, the number of levels will not increase. Similarly, a multiple control Toffoli gate with all 
negations simulation with a single auxiliary qubit will require 4 extra NOT gates with no increase in the number of 
logic levels (upper bound). 

A similar argument holds in the case of the decomposition form HI, but we do not discuss this here. Rather, we 
move on to considering other types of circuits. 
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Figure 5: Barenco et al. IJ] inspired simulations of multiple control Toffoli gate with some but not all negations 
illustrated for the maximal number of negative controls possible A: using n — 3 auxiliary qubits and B: using a single 
auxiliary qubit. 



6.2 Benchmark circuits 

Here we present the results of the application of the templates to a number of quantum circuits implementing various 
reversible Boolean and quantum arithmetic functions that can be found in the literature. Many reversible/quantum cir- 
cuits have constant input values and garbage outputs. This typically occurs when a non-reversible function is mapped 
to a reversible one prior to synthesis as a reversible circuit. In such cases, extra simplifications at the extremities of 
the circuit can be performed: 

• If a gate whose control is an input constant can be moved to the beginning of the circuit then depending on 
the constant input controlling the gate being or 1 the gate can be either deleted or uncontrolled (assuming an 
uncontrolled gate has a lesser cost). 

• If a gate with the target on a garbage output can be moved to the end, we can delete it as we are not interested 
with the value of the garbage output. 

We took the circuits from ifTSl composed with NOT, CNOT, and Toffoli gates and compared their quantum real- 
ization costs (defined as NCV gate count) before and after applying the templates. We also compacted levels in the 
simplified circuits and report the number of levels we get. Since fT5l do not compact levels in their circuits, we have 
no comparisons for the number of levels. Table |2] summaries the results. 

Let us describe the simplification procedure for one of these benchmark circuits: the 5-qubit oracle function 
mods. It leaves the first four inputs unchanged and inverts the last one if, and only if, the first four represent an integer 
divisible by 5. We first found a Toffoli gate realization (circuit modSmils in Table |2]i. We then applied the template 
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Table 2: Simplification of the benchmark circuits from fT5]. Circuit name appears in column Name and is taken 
directly from 1 15 1, Size indicates the number of qubits in the circuit. NCV GC lists the quantum NCV gate count when 
the Toffoli gates in the corresponding circuit are substituted with their quantum implementations. Optimized NCV GC 
and Levels show the quantum gate count and the number of logic levels after reversible gates are substituted with 
their quantum circuits and the resulting circuit is run through the template simplification and then level compaction 
processes. We do not report the runtimes in this table because all circuits were computed almost instantaneously. 



Name 


Size 


NCVGC 


Optimized NCV GC 


Levels 


2of5d2 


7 


40 


29 


25 


rd32 


4 


12 


6 


4 


3.17fc 


3 


14 


10 


10 


4_49-12-32 


4 


32 


27 


21 


6symd2 


10 


72 


53 


27 


9symd2 


12 


108 


82 


50 


modSdl 


5 


24 


14 


9 


mod5d2 


5 


25 


11 


8 


modSmils 


5 


13 


9 


5 


ham3tc 


3 


9 


7 


7 


haml- 25 -49 


7 


49 


40 


28 


hwb4 - 1 1 - 23 


4 


23 


21 


16 


rd53d2 


8 


44 


31 


19 


rdl3d2 


10 


76 


55 


34 


rd84-dl 


15 


112 


86 


41 



based optimization techniques described above. The resulting circuit is illustrated in Figure |6]\- If the inputs are not 
required to be passed through unchanged, the last three gates may be dropped. We next applied the level compaction 
algorithm. The compacted version of the circuit in Figure|6]\ is illustrated in Figure|6ji. Note how the level compactor 
changes the form of the circuit to allow fewer levels. This happens when even size templates are applied to change the 
form of the circuit to facilitate further level compaction. Unless this is done, circuit in Figure|6K cannot be compacted 
to have less than 10 logic levels. This is because qubit d is used 10 times as a control/target. If the inputs need not be 
recovered, the depth of such computation is only 5 logic levels, and the number of gates required is 9. 

Finally, we applied the simplification procedure to some levelled quantum circuits for adder, comparator and 
modular exponentiation type function (the latter is an important part of the Shor's factoring algorithm) reported in 
ifSl flTJI . We took their circuits, substituted quantum implementations of the Toffoli gates where needed, simplified 
them and compacted levels (treating each circuit as non-levelled). In the circuit with Fredkin gates ([17|, Fig. 4) we 
used CNOT-Toffoli-CNOT decomposition of the Fredkin gate, and in the circuit with single negative control Toffoli 
gates (1 17J, Fig. 5), we used circuit from FigureSJj. The results are reported for 3 circuits that can be found in |5 1 and 
3 circuits from ifTTI (Table [3]l. 

7 Future work 

There are several possibilities to improve our simplification approach. We are interested to develop a smart automated 
procedure for substituting quantum circuits for multiple control Toffoli gates. The search for the new templates can 
be accomplished finding all identities of the given size and applying templates to simplify them. All identities that do 
not simplify are the new templates. Such search method is also suitable for proving the completeness of the set of the 
templates found. 

As far as level compaction goes, we presented a very simple and greedy algorithm. We expect that our results for 
the number of levels can be improved through use of a smarter level compaction algorithm. However, we believe that 
the templates could still serve as an efficient core for such improved level compactor. 

Finally, we are interested in extending the experimental results of the templates application to other sets of quan- 
tum gates including rotation gates and elementary pulses (NMR quantum technology; this will be a technology- 
specific optimization), and to account for different architectures (which should be straightforward since each unde- 
sirable gate can be punished with a high cost). Since the templates definition is based on the properties of matrix 
multiplication only, they can be applied in any quantum gate library, and for any cost metric. 
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Figure 6: Circuit for the oracle modS. 

Table 3: Simplification of the benchmarks from ifSl flTJI . Name shows where the initial circuit can be found, Size lists 
the number of qubits used, NCV GC lists the number of NCV gates required and Levels shows the number of levels 
(each level with a Toffoli gate considered to have width 5). Our results for the number of gates and the number of 
levels are listed in columns Optimized NCV GC and Optimized levels. The final column presents the total runtime 
(elapsed time) required by our software to complete the circuit simplification and compact the levels when run on an 
Athlon XP2400+ with 512M RAM machine under Windows. 



Name 


Size 


NCVGC 


Levels 


Optimized 
NCVGC 


Optimized 
levels 


Runtime 


15], Fig. 5 


35 


368 


86 


303 


53 


1.883 sec 


13, Fig. 6 


24 


172 


49 


110 


27 


0.341 sec 


0, Fig. 7 


26 


337 


101 


287 


61 


1.903 sec 


iflTl. Fig. 2 


10 


60 


47 


34 


20 


0.07 sec 


iflTl. Fig. 4 


15 


70 


44 


58 


23 


0.210 sec 


iflTl. Fig. 5 


30 


168 


37 


112 


21 


0.301 sec 



8 Conclusion 

We have introduced quantum templates and demonstrated how they can be applied for quantum circuit simplification 
and level compaction. Templates can be developed for any type quantum circuit, and can be applied for various cost 
metrics (simple gate count, weighted gate count, non-linear metrics). We implemented our algorithms in C++ and 
demonstrated the effectiveness of our approach using a variety of previously published circuits. In our tests, we first 
target gate minimization and then compact the logic levels in the simplified circuit. In particular, we reduced the sizes 
and number of logic levels in the best known multiple control Toffoli gate quantum realizations (including multiple 
control Toffoli gates with negative controls) and in a number of arithmetic quantum circuits presented by previous 
authors. 

Appendix 

Consistency of the template definition is based on the following four lemmas. 

Lemma 1: For any circuit GqGi... G,„_i realizing a quantum function /, circuit G^ijG^ij... Gq ' is a realization for 

Proof: This statement follows from the properties of matrix multiplication operation. D 

Lemma 2: For any rewriting rule G1G2... Gk -^ Gt+iGk+i--- Gk+s its gates satisfy the following: G1G2... GkG^K 
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^k+s-i--- ^k+i ~ ^' where / denotes the identity matrix (transformation). 

Proof: The following set of equalities constructed using the rule GG^ = / for a single gate G proves the statement. 

G1G2... Gk — Gk+iGk+2--- Gk+s 
G1G2... GkG'^^fi'^^^._^... G^_^_j = 

= Gk+iGk+2... Gk+sGl^fil^^_y.. G^:_^i 

Gi G2 . . . GkGl^fil^^_ 1 . . . G^:_^ 1 = /. 

n 

Lemma 3: For an identity GqGi... Gm-i and any parameter p, < p < m— 1, GoGi...Gp_i -^ G^m-i^m-2-- ^p' ^^ 
a rewriting rule. 

Proof: Proof of this statement follows from the previous one by renaming the subscripts and listing the equalities in 
the reverse order. D 

Lemma 4: If GqGi... G,„_i ~I, then G\... G,„-iGo =/. 

Proof: The following proves the statement. 

GqGi... G„,_i =/ 
Gq GqGi... G,„_i — Gq I 
Gi... G,„_i — Gq 
Gi... G„,_iGo = Gq Go 

Gl... Gm-lGo =/. 

n 
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